Skip to content

add ITT recipe #2072

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Oct 26, 2022
Merged

add ITT recipe #2072

merged 19 commits into from
Oct 26, 2022

Conversation

jingxu10
Copy link
Contributor

@jingxu10 jingxu10 commented Oct 7, 2022

Add tutorial for ITT feature at pytorch/pytorch#63289

@netlify
Copy link

netlify bot commented Oct 7, 2022

Deploy Preview for pytorch-tutorials-preview ready!

Name Link
🔨 Latest commit 068d53b
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-tutorials-preview/deploys/63599f268165430009be6ec6
😎 Deploy Preview https://deploy-preview-2072--pytorch-tutorials-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@svekars svekars added the 1.13 label Oct 7, 2022
Copy link
Contributor

@svekars svekars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some editorial suggestions.

Requirements
------------

* PyTorch 1.13+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* PyTorch 1.13+
* PyTorch v1.13 or later


In this recipe, you will learn:

* An overview of Intel® VTune™ Profiler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* An overview of Intel® VTune™ Profiler
* What is Intel® VTune™ Profiler

In this recipe, you will learn:

* An overview of Intel® VTune™ Profiler
* An overview of the Instrumentation and Tracing Technology (ITT) API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* An overview of the Instrumentation and Tracing Technology (ITT) API
* What is the Instrumentation and Tracing Technology (ITT) API

* PyTorch 1.13+
* Intel® VTune™ Profiler

The instructions for installing PyTorch are available at `pytorch.org <https://pytorch.org/>`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The instructions for installing PyTorch are available at `pytorch.org <https://pytorch.org/>`_.
The instructions for installing PyTorch are available at `pytorch.org <https://pytorch.org/get-started/locally/>`__.


For those who are familiar with Intel Architecture, Intel® VTune™ Profiler provides a rich set of metrics to help users understand how the application executed on Intel platforms, and thus have an idea where the performance bottleneck is.

More detailed information, including getting started guide, are available `here <https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html>`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
More detailed information, including getting started guide, are available `here <https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html>`_.
More detailed information, including a Getting Started guide, are available `on the Intel website <https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html>`__.


Right side of the windows is split into 3 parts: `WHERE` (top left), `WHAT` (bottom left), and `HOW` (right). With `WHERE`, you can assign a machine where you want to run the profiling on. With `WHAT`, you can set path of the application that you want to profile. To profile a PyTorch script, it is recommended to wrap all manual steps, including activate a conda environment and setting required environment variable, into a bash script, then profile this bash script. In the screenshot above, we wrapped all steps into the `launch.sh` bash script and profile `bash` with parameter to be `<path_of_launch.sh>`. In the right side `HOW`, you can choose whatever type that you would like to profile. Details can be found at `Intel® VTune™ Profiler user guide <https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/analyze-performance.html>`_.

With a successful profiling with ITT, you can open `Platform` tab of the profiling result to see labels in Intel® VTune™ Profiler timeline. All operators starting with `aten::` are operators labeled implicitly by the ITT feature in PyTorch. Labels `iteration_N` are explicitly labeled with specific APIs `torch.profiler.itt.range_push()`, `torch.profiler.itt.range_pop()` or `torch.profiler.itt.range()` scope. Please check the sample code in next section for details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
With a successful profiling with ITT, you can open `Platform` tab of the profiling result to see labels in Intel® VTune™ Profiler timeline. All operators starting with `aten::` are operators labeled implicitly by the ITT feature in PyTorch. Labels `iteration_N` are explicitly labeled with specific APIs `torch.profiler.itt.range_push()`, `torch.profiler.itt.range_pop()` or `torch.profiler.itt.range()` scope. Please check the sample code in next section for details.
With a successful profiling with ITT, you can open `Platform` tab of the profiling result to see labels in the Intel® VTune™ Profiler timeline. All operators starting with `aten::` are operators labeled implicitly by the ITT feature in PyTorch. Labels `iteration_N` are explicitly labeled with specific APIs `torch.profiler.itt.range_push()`, `torch.profiler.itt.range_pop()` or `torch.profiler.itt.range()` scope. Please check the sample code in the next section for details.

A short sample code showcasing how to use PyTorch ITT APIs
----------------------------------------------------------

Sample code below is the script that was used for profiling in the screenshots above.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Sample code below is the script that was used for profiling in the screenshots above.
The sample code below is the script that was used for profiling in the screenshots above.


Sample code below is the script that was used for profiling in the screenshots above.

The topology is formed by 2 operators, `Conv2d` and `Linear`. Three iterations of inference were performed. Each iteration was labled by PyTorch ITT APIs as text string `iteration_N`. Either pair of `torch.profile.itt.range_push` and `torch.profile.itt.range_pop` or `torch.profile.itt.range` scope does the customized labeling feature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The topology is formed by 2 operators, `Conv2d` and `Linear`. Three iterations of inference were performed. Each iteration was labled by PyTorch ITT APIs as text string `iteration_N`. Either pair of `torch.profile.itt.range_push` and `torch.profile.itt.range_pop` or `torch.profile.itt.range` scope does the customized labeling feature.
The topology is formed by two operators, `Conv2d` and `Linear`. Three iterations of inference were performed. Each iteration was labeled by PyTorch ITT APIs as text string `iteration_N`. Either pair of `torch.profile.itt.range_push` and `torch.profile.itt.range_pop` or `torch.profile.itt.range` scope does the customized labeling feature.


#!/bin/bash

# Retrive the directory path where contains both the sample.py and launch.sh so that this script can be invoked from any directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Retrive the directory path where contains both the sample.py and launch.sh so that this script can be invoked from any directory
# Retrive the directory path where the path contains both the sample.py and launch.sh so that this script can be invoked from any directory

@jingxu10
Copy link
Contributor Author

jingxu10 commented Oct 8, 2022

Some editorial suggestions.

Done amendments

@svekars svekars requested a review from chauhang October 10, 2022 21:47
@svekars
Copy link
Contributor

svekars commented Oct 10, 2022

@chauhang can you please assign someone to review this PR? Thanks!

@svekars svekars requested a review from malfet October 11, 2022 17:30
Comment on lines 59 to 65
To enable this feature, codes which are expected to be labeled should be invoked under a `torch.autograd.profiler.emit_itt()` scope. For example:

.. code:: python3

with torch.autograd.profiler.emit_itt():
<codes...>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

English is not my native, but codes does not sound like a correct plural for code. Perhaps it can be replaced with something like code to be profiled

Suggested change
To enable this feature, codes which are expected to be labeled should be invoked under a `torch.autograd.profiler.emit_itt()` scope. For example:
.. code:: python3
with torch.autograd.profiler.emit_itt():
<codes...>
To enable this feature, code block, which is expected to be labeled should be invoked within `torch.autograd.profiler.emit_itt()` scope. For example:
.. code:: python3
with torch.autograd.profiler.emit_itt():
<code-to-be-profiled...>

Comment on lines 133 to 146
The `launch.sh` bash script to wrap all manual steps is shown below.

.. code:: bash

# launch.sh

#!/bin/bash

# Retrive the directory path where the path contains both the sample.py and launch.sh so that this script can be invoked from any directory
BASEFOLDER=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
source ~/miniconda3/bin/activate
conda activate ipex_py38
cd ${BASEFOLDER}
python sample.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this section is specific to ITT? User does not need to install (nor use) conda to benefit from ITT feature. Nor should it be bound to python-3.8 , as environment name suggests. And since example does not have any data dependencies, changing working directory to base folder is superfluous as well, isn't it?

But paragraph about ITT filename and folder where it will be generated are probably relevant, isn't it?

Copy link
Contributor Author

@jingxu10 jingxu10 Oct 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use VTune to profile a python script, it is always recommended to wrap the execution of the python script into a bash script. Because executing a python script requires the configuration of a Python environment, which is not always there by default. Also, launching the script from vtune could be done in a directory that doesn't have the python script, so it is better to switch the active directory into the desired folder inside the bash script execution.

Detailed explanation is there in the section above:
To profile a PyTorch script, it is recommended to wrap all manual steps, including activating a Python environment and setting required environment variables, into a bash script, then profile this bash script. In the screenshot above, we wrapped all steps into the launch.sh bash script and profile bash with the parameter to be <path_of_launch.sh>.

I restated the description to make it more descriptive and related to vtune. Also I changed the specific python version and conda distribution into a generic description. Does the updated content below look good to you?

The `launch.sh` bash script, mentioned in the Intel® VTune™ Profiler GUI screenshot, to wrap all manual steps is shown below.

.. code:: bash

   # launch.sh

   #!/bin/bash

   # Retrive the directory path where the path contains both the sample.py and launch.sh so that this bash script can be invoked from any directory
   BASEFOLDER=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
   <Activate a Python environment>
   cd ${BASEFOLDER}
   python sample.py

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with two minor suggestions:

  • Get rid of <codes>
  • Explain better/get rid of starter script

@jingxu10
Copy link
Contributor Author

Hi @malfet , @svekars , all CI had passed. Would you take a final review? Shall I do the merge with pytorchbot or you guys will merge this PR? Thank you.

@svekars
Copy link
Contributor

svekars commented Oct 12, 2022

@jingxu10 we are trying to get a review from the partners team. Stay tuned.


With a successful profiling with ITT, you can open `Platform` tab of the profiling result to see labels in the Intel® VTune™ Profiler timeline. All operators starting with `aten::` are operators labeled implicitly by the ITT feature in PyTorch. Labels `iteration_N` are explicitly labeled with specific APIs `torch.profiler.itt.range_push()`, `torch.profiler.itt.range_pop()` or `torch.profiler.itt.range()` scope. Please check the sample code in the next section for details.

.. figure:: /_static/img/itt_tutorial/vtune_timeline.png
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate a bit more on how to read this picture? What do the big brown boxes mean? Or what does someone learn once they look at this picture?

A big convolution gets broken into multiple smaller ones on different thread?

@msaroufim
Copy link
Member

msaroufim commented Oct 12, 2022

I agree with @malfet's feedback and also the starting section is a bit too long so I'd shorten that substantially and maybe get rid of the second image otherwise looks good. I'll let @svekars handle merging once the feedback is addressed

Copy link
Contributor

@HamidShojanazeri HamidShojanazeri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jingxu10 this is a great addition to the recipes.

Requirements
------------

* PyTorch v1.13 or later
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : PyTorch 1.13 or later

:width: 100%
:align: center

Three sample results are available in the left side navigation bar under `sample (matrix)` project. If you do not want profiling results appear in this default sample project, you can create a new project via the button `New Project...` under the blue `Configure Analysis...` button. To start a new profiling, click the blue `Configure Analysis...` button to initiate configuration of the profiling.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typos : on the left side

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the three dots after "New project..." and " configure Analysis..." intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the three dots are shown in the VTune GUI buttons, so I copied them.

2. Explicit invocation: If customized labeling is needed, users can use APIs mentioned at `PyTorch Docs <https://pytorch.org/docs/stable/profiler.html#intel-instrumentation-and-tracing-technology-apis>`__ explicitly to label a desired range.


To enable this feature, codes which are expected to be labeled should be invoked under a `torch.autograd.profiler.emit_itt()` scope. For example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does "this" here refers to the explicit invocation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this refers to the explicit invocation. I'll make it clear in the tutorial.

:width: 100%
:align: center

The right side of the windows is split into 3 parts: `WHERE` (top left), `WHAT` (bottom left), and `HOW` (right). With `WHERE`, you can assign a machine where you want to run the profiling on. With `WHAT`, you can set the path of the application that you want to profile. To profile a PyTorch script, it is recommended to wrap all manual steps, including activating a Python environment and setting required environment variables, into a bash script, then profile this bash script. In the screenshot above, we wrapped all steps into the `launch.sh` bash script and profile `bash` with the parameter to be `<path_of_launch.sh>`. In the right side `HOW`, you can choose whatever type that you would like to profile. Details can be found at `Intel® VTune™ Profiler User Guide <https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/analyze-performance.html>`__.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typos : In the right side `HOW --> on the right side 'How'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please clarify on " you can choose whatever type that you would like to profile."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel VTune Profiler provides a bunch of profiling types (https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/analyze-performance.html) that users can choose from.

:width: 100%
:align: center

The right side of the windows is split into 3 parts: `WHERE` (top left), `WHAT` (bottom left), and `HOW` (right). With `WHERE`, you can assign a machine where you want to run the profiling on. With `WHAT`, you can set the path of the application that you want to profile. To profile a PyTorch script, it is recommended to wrap all manual steps, including activating a Python environment and setting required environment variables, into a bash script, then profile this bash script. In the screenshot above, we wrapped all steps into the `launch.sh` bash script and profile `bash` with the parameter to be `<path_of_launch.sh>`. In the right side `HOW`, you can choose whatever type that you would like to profile. Details can be found at `Intel® VTune™ Profiler User Guide <https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/analyze-performance.html>`__.
Copy link
Contributor

@HamidShojanazeri HamidShojanazeri Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how the workflow works, is this correct?

  1. Add context manager "with torch.autograd.profiler.emit_itt():"
  2. Run the code --> it saved the traces somewhere
  3. load the traces in the profiler?
    Or
    run the profiler, then With WHAT, How and Where, you can set the path of the application that you want to profile, it automatically capture the traces and show you. Assume, adding the context manager can be additional to what it does for labeling a part not the only way to capture the profiles.

This statement "you can set the path of the application that you want to profile" suggest the second path/workflow.
Can you please clarify the workflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the second way. Users launch VTune first, then launch the application from vtune. WHAT in VTune is the place to let vtune know which application to launch and profile.

@jingxu10
Copy link
Contributor Author

I agree with @malfet's feedback and also the starting section is a bit too long so I'd shorten that substantially and maybe get rid of the second image otherwise looks good. I'll let @svekars handle merging once the feedback is addressed

The second image, with descriptions below it, introduces how to use Intel VTune Profiler to profile a PyTorch script. I'll shorten the starting 2 What is ... sections.

@jingxu10
Copy link
Contributor Author

Hi @msaroufim @HamidShojanazeri , would you please review the updated content? Do they align with your expectation?

2. Explicit invocation: If customized labeling is needed, users can use APIs mentioned at `PyTorch Docs <https://pytorch.org/docs/stable/profiler.html#intel-instrumentation-and-tracing-technology-apis>`__ explicitly to label a desired range.


To enable explicit invocation, codes which are expected to be labeled should be invoked under a `torch.autograd.profiler.emit_itt()` scope. For example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still have codes here

Again, English is not my native, but please compare results of https://www.bing.com/search?q=codes vs https://www.bing.com/search?q=code

Copy link
Contributor Author

@jingxu10 jingxu10 Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, your concern is on the plural. I'm not a native speaker as well. Changed it to code anyway. Done


#!/bin/bash

# Retrive the directory path where the path contains both the sample.py and launch.sh so that this bash script can be invoked from any directory

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jingxu10 is it Retrieve the directory path here? Might be a typo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Corrected.

@jingxu10
Copy link
Contributor Author

If the updates look good to you, please help to merge it.
Thank you.


As illustrated on the right side navigation bar, brown portions in the timeline rows show CPU usage of individual threads. The percerntage of height of a thread row that the brown portion occupies at a timestamp aligns with that of the CPU usage in that thread at that timestamp. Thus, it is intuitive from this timeline to understand the followings:

1. How well CPU cores are utlized on each thread.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jingxu10, utlized -> utilized ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jingxu10
Copy link
Contributor Author

Hi, may I know status? Are we waiting for more approvals or it is ok to merge? Please feel free to let me know if there are further comments.

@svekars
Copy link
Contributor

svekars commented Oct 17, 2022

@jingxu10 I don't think there are any more comments left and the PR looks good. However, we typically merge tutorials for the new release a couple days before the release. There are no action items from you as of now and I will keep you posted.

@jingxu10
Copy link
Contributor Author

Got it.
Thanks a lot!

@malfet malfet changed the title add itt tutorial add ITT recipe Oct 25, 2022
@svekars svekars merged commit adda5fe into pytorch:master Oct 26, 2022
@jingxu10 jingxu10 deleted the jingxu10/itt branch November 9, 2022 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants